Journal of Vision
● Association for Research in Vision and Ophthalmology (ARVO)
Preprints posted in the last 30 days, ranked by how well they match Journal of Vision's content profile, based on 92 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Shurygina, O.; Wirth, L. A.; Rolfs, M.; Ohl, S.
Show abstract
Saccades made during memory maintenance prioritize memory for the saccade target, but it is unclear if this benefit is specific to a location or extends across memorized objects. In three experiments, we examined whether saccadic selection spreads to other locations within the same object. In Experiment 1, we asked observers to remember three oriented Gabors presented either within contour-defined objects or without object structure. A subsequent movement cue prompted observers to move their eyes to the indicated location. We then probed memory for stimuli at locations equidistant from the saccade target, in either the same or a different object. Memory was best for stimuli at locations congruent with the saccade target, and consistently weaker for other stimuli presented in the same or a different object than the saccade target. In Experiment 2, we created more complex objects by adding more object features to the stimulus. Again, memory performance was best for stimuli congruent with the saccade target location, whereas memory in incongruent trials was worse and similar for stimuli in the same and different object as the saccade target. In Experiment 3, we tested if saccadic selection is present and propagates within the object in a change detection task. Again, memory performance (i.e., change detection) was best at the saccade target location. However, this memory benefit also spread to other locations within the same object. Our results imply that saccadic selection in visual working memory is primarily space-based but can also spread towards locations within the object where a saccade was directed.
Sun, H.; Birney, A.; Singh, N.; Olszko, A.; Chen, P.; Ke, J.; Rosenberg, M. D.; Jangraw, D. C.
Show abstract
Mind-wandering (MW) is a frequent and pervasive phenomenon, yet it is commonly assessed using self-reports or probe-based methods that offer limited temporal precision regarding its onset. In this study, we introduce a novel paradigm, ReMind, that estimates the onset and duration of MW episodes during natural reading by combining retrospective self-reports with eye-tracking. Participants indicated the words where they believed their mind started and stopped wandering, and these reports were aligned with gaze timestamps to estimate MW onset. Using data from 44 participants, we examined whether knowledge of MW onset improves the detection of MW from eye-tracking signals. To evaluate relevance for both self-report and thought-probe paradigms, we additionally simulated thought probes by randomly sampling time points during reading. Logistic regression classifiers trained on eye-tracking features extracted from time windows anchored to MW onset achieved AUROC scores of 0.659 and 0.621 under the self-report and simulated thought-probe paradigms, respectively, using leave-one-subject-out cross-validation. In both cases, onset-aligned windows outperformed classifiers trained using arbitrary MW windows. Sliding-window analyses further revealed systematic temporal changes around MW onset, with classification performance peaking at approximately 3 seconds after onset. Feature-level analyses showed reduced fixation rate and fixation dispersion, along with increased pupil size following MW onset. Together, these findings characterize the temporal progression from on-task reading to MW. Overall, ReMind provides a useful framework for studying the temporal dynamics of MW during naturalistic reading.
Khan, R.; Bekiari, S.; Hierck, B.; Salvatori, D.; Kenemans, L.
Show abstract
Mental rotation in 3D is a key cognitive skill involving dynamic spatial transformations, for which pronounced individual differences have been documented. Here we ask whether individual differences in 3D abilities can be explained by analogous differences in 2D abilities. 3D mental-rotation was assessed by the Vandenberg & Kruse Mental Rotation Test (3D-MRT) and examined for association with performance and underlying electrocortical mechanisms during a 2D letter rotation task. Participants (N=40) first completed the MRT and then performed a computerized 2-D letter rotation task in which they had to identify whether letters were oriented in a standard or a mirrored direction (parity judgment) when rotated at 0{degrees}, 60{degrees}, 120{degrees}, and 180{degrees} while EEG was recorded. Reaction times (RTs) and error rates increased with angular disparity. The angular disparity effect on RT was smaller for mirrored letters. Low, relative to high, 3D-MRT scoring participants showed more pronounced accuracy declines at higher rotation angles. An EEG Event Related Potential (ERP) known as the Rotation-Related Negativity (RRN) became more pronounced with increasing angular disparity. High 3D-MRT scores were associated with a stronger RRN response at central-parietal sites. In addition, the ERP-P3b wave was more pronounced at central-parietal sites for low 3D-MRT scorers, independent of angular disparity. It is concluded that 3D rotational ability is positively associated with 2D mental rotation performance, and more strongly with enhanced recruitment of neural visual-spatial cortical representations than with enhanced recruitment of more general cognitive resources.
Geisler, W. S.; Das, A.
Show abstract
The human visual system segments images using both high-level recognition mechanisms and low-level mechanisms that are largely independent of specific prior experience. The low-level mechanisms are essential for initiating recognition processes, and for learning to recognize new materials, objects, and contexts. Here we describe a hierarchical Bayesian observer (HBO) model of texture segmentation that is biologically plausible, takes into account the statistics of natural scenes, and does not depend on prior experience. The HBO model consists of five steps: local similarity grouping with local normalization, mutual similarity grouping (local grouping is strengthened if the neighboring regions are similar to the same set of other regions), transitive grouping (good continuation), confidence grouping (neighboring regions far from the same-different decision boundary guide grouping of regions near the decision boundary), and region grouping (similarity grouping of the regions from the initial segmentation). We find that a local similarity grouping process, trained to maximize accuracy, predicts human texture discrimination accuracy. We then find that the four additional steps accurately segment images with randomly shaped regions containing arbitrary natural textures. The success of the model depends on all the steps, but especially on local-similarity and transitive grouping. We also find that the transitive grouping allows correct segmentation of non-stationary texture regions (e.g., textures slanted in depth). Further, we find that when illumination varies across the image, local normalization enables both correct texture segmentation and estimation of illumination change. Finally, we find that unlike our model large state-of-the-art deep networks often fail on these stimuli.
Yu, Y.; Hafed, Z. M.
Show abstract
Visual response strength in the primate superior colliculus (SC) has recently been shown to inversely correlate with trial-by-trial saccadic reaction time in a much stronger way than visual response strength in the primary visual cortex (V1). However, for any given visual stimulus onset, populations of neurons in each brain area are concurrently activated, leaving open the question of how V1 visual response strength can predict trial-by-trial saccadic reaction time when multiple simultaneously recorded neurons are taken into account. Using a classic visually-guided saccade task, here we assessed the quality of predicting trial-by-trial saccadic reaction time from the visual response strengths of 1 to 10 simultaneously recorded neurons in each brain area. For each session, we modeled saccadic reaction time as a weighted linear combination of the visual response strengths of N simultaneously recorded neurons. Consistent with the prior work, the visual response strength of a single SC neuron was better than that of a single V1 neuron at predicting reaction time. By adding more simultaneously recorded neurons, the prediction got much better in the SC, but not in V1.Only for 100% contrast dark stimuli (darker in luminance than the surrounding gray background) did V1 show an increase in prediction quality with more simultaneously recorded neurons. This increase, which was still substantially weaker than in the SC, could reflect the preference of V1 neurons for dark contrasts. These results suggest that despite qualitative similarities between SC and V1 visual responses, SC visual responses are functionally reformatted from their V1 counterparts. SignificanceThe superior colliculus (SC) is an important sensory-motor structure for controlling eye movements, and it receives a significant portion of its inputs directly from the primary visual cortex (V1). Despite this, SC visual responses are much better correlated with trial-by-trial variability in saccadic eye movement timing than V1 visual responses, and this effect is strongly amplified when considering simultaneously recorded neurons. Thus, SC and V1 visual responses serve fundamentally different functions from a motor perspective.
Tasliyurt-Celebi, S.; de Haas, B.; L.-H. Vo, M.; Dobs, K.
Show abstract
Human perception is shaped by both sensory input and prior knowledge or expectations. But how does prior contextual information influence rapid visual processing? Here, we combined eye tracking with feature-based encoding models across two experiments to predict detection latencies in a core visual task: rapid face detection in natural scenes (N = 38 per experiment). In the first experiment, we manipulated the presence of faceless scene previews. In the second experiment, we additionally restricted peripheral visual input using a moving-window paradigm, thereby increasing reliance on prior information. Across both experiments, prior context facilitated face detection, particularly for challenging images. This facilitation was already evident in the very first eye movement, suggesting that previews shape perceptual strategies from the outset. To quantify what information guided behavior, we modeled detection latencies using a set of image-based predictors capturing (i) sensory information and (ii) a scene-derived spatial prior: the expected face location. Both predictor classes explained latency variation across images. Among sensory predictors, the difference in deep neural network responses induced by the presence of the face provided the strongest out-of-sample prediction of detection latency. Critically, when scene previews were available, the contribution of the spatial prior increased, while reliance on sensory-driven features was generally reduced. Together, these findings indicate that prior scene context shifts the balance of information used for rapid face detection from sensory-driven to expectation-based spatial guidance.
Kumarasinghe, A.; Bui, V.; Ghanbarzadeh, R.
Show abstract
Skin-tone labels are absent from public dermoscopy benchmarks such as the International Skin Imaging Collaboration (ISIC), making it impossible to audit whether clinical AI performs equitably across skin tones. While several recent works estimate skin tone automatically from clinical photography and selfies, we ask whether this approach is feasible on dermoscopy, the primary imaging modality of these benchmarks. To answer this, we make three main contributions. First, we release MST-Derm, a dual-rater Monk Skin Tone (MST) annotation benchmark on 500 ISIC 2018 images. Raters were given an explicit unrateable option for crops where the skin surrounding the lesion was too occluded to label confidently. We find that 60% of images were marked unrateable, yielding a 193-image consensus subset (quadratic-weighted Cohen's Kappa = 0.82). Second, we conduct a systematic feasibility study of three pixel-based MST annotation pipelines spanning the principal families in prior work: palette matching in perceptual colour space, robust colour statistics, and projection to a 1D colorimetric scalar. All three pipelines produce ordinal signal above chance (95% confidence intervals on quadratic-weighted Kappa exclude zero). However, ISIC 2018's extreme light-skin bias leaves 82% of the evaluation set at MST 2, giving a constant "always predict MST 2" baseline an accuracy floor the methods cannot overcome. To separate algorithmic signal from dataset bias, we evaluate on a class-balanced subset. The best method reaches quadratic-weighted Kappa = 0.43 against the trivial baseline of Kappa = 0.00, confirming the signal is genuine. Third, we diagnose this performance ceiling. We trace the bottleneck to two causes: dermoscopy's specialised illumination physically compresses the colour range on which lighter skin tones differ, and ISIC's dataset skew makes standard absolute-accuracy metrics uninformative. We conclude that while pixel-based colour features carry real MST signal on dermoscopy, current performance is insufficient for autonomous annotation. We release the benchmark, annotation protocol, all prediction runs, and analysis code to facilitate the development of robust skin-tone estimators, a vital prerequisite for accurately auditing fairness and mitigating bias in dermatological machine learning.
Zhao, J.; Ahmadi, S.-A.; Decker, J.; Zwergal, A.; Eulenburg, P. z.; Flanagin, V. L.; Wuehr, M.
Show abstract
Quantitative eye movement analysis is important for neuro- logical diagnostics, yet existing video-oculography (VOG) systems typ- ically require calibration, device-specific settings, or accurate gaze la- bels. We present VOGeo-Gaze, a real-time, calibration-free, geometry- aware neural network that estimates gaze by reconstructing anatomi- cally meaningful eyeball parameters from image features. The method combines segmentation-driven projection geometry, a refraction-aware pupil correction module, and temporal anatomical stabilization, so gaze is derived from interpretable eye geometry rather than direct angular regression. Trained only on the public TEyeD dataset with weak gaze supervision, VOGeo-Gaze was evaluated on 116 clinical recordings from 17 patients and 19 healthy subjects using EyeSeeCam, a clinical gold- standard VOG system. It achieved median absolute angular errors of 0.33{whitebullet} horizontally and 0.35{whitebullet} vertically, with nearly 92% of recordings below 1{whitebullet} error while operating at >300 FPS. These results demonstrate sub-degree clinical gaze estimation without subject-specific calibration, camera intrinsics, or accurate gaze labels, providing a scalable and inter- pretable alternative to conventional VOG pipelines. Code is available at https://github.com/DSGZ-MotionLab/VOGeo-Gaze.
Gadari, A.; Vichare, A. A.; Corona, F.; Vupparaboina, S. C.; Lall, S. R.; Gregori, G.; Hasan, N.; Sahel, J.-A.; Chhablani, J.; Bollepalli, S. C.; Vupparaboina, K. K.
Show abstract
Manufacturer-defined signal-strength indices are frequently employed as quality benchmarks for automated optical coherence tomography analysis, yet their empirical relationship with deep learning segmentation accuracy remains unclear. Because these metrics were originally developed for conventional image-processing pipelines, their ability to predict modern model-based segmentation accuracy has not been empirically validated. To address this gap, we evaluated the Heidelberg Spectralis Q-score against U-Net segmentation performance across 5,047 B-scans from 103 eyes for three anatomical boundaries of the posterior segment of the eye: the Ellipsoid Zone (EZ), Bruch's Membrane (BM), and Choroid Outer Boundary (COB). Alongside standard boundary agreement metrics (MAE, MSE, Dice Similarity Coefficient), we adapted the Earth Mover's Distance (EMD) from optimal transport theory as a boundary evaluation metric. Unlike column-wise averages, EMD quantifies boundary agreement as a 2-D geometric displacement, directly measuring residual spatial displacement between the model segmented boundary and the ground-truth boundary. Our results demonstrate that the Q-score - originally designed to gate image-processing-based automated analysis - is a poor predictor of deep learning boundary segmentation accuracy, with explained variance (R2) failing to exceed 1.4% across all three boundaries. We further observed a monotonically increasing error hierarchy with anatomical depth (EZ < BM < COB), consistent across metrics, which is unexplained by the signal strength. At the COB, correlations were paradoxically positive, explained by a B-scan-level mediation chain: higher Q-scores correspond to greater choroidal thickness (r=0.113, {rho}=0.158), which in turn predicts higher COB segmentation error (r=0.165, {rho}=0.191) - a localization difficulty that global signal strength cannot capture. Collectively, these findings challenge the implicit assumption that signal-strength-based quality thresholds are a reliable proxy for deep learning model performance, and motivate a shift toward task-specific acquisition quality criteria calibrated to model performance rather than signal interpretability.
Mugleston, J. D.; Huang, S.-M.; Dahl, C. D.
Show abstract
Human pointing is often used to test whether dogs extract object-specific information from human communicative cues. However, above-chance responses in standard object-choice tasks do not by themselves distinguish between a referential interpretation, in which the gesture identifies a specific target, and an attentional interpretation, in which it primarily biases behaviour toward a broader spatial region. We addressed this issue using an asymmetric six-cup arrangement designed to separate coarse side guidance from exact cup localisation more clearly than a symmetric multi-cup design. Performance in domestic dogs was analysed using three measures: the probability of reaching the correct side, the probability of choosing the correct cup overall, and the probability of choosing the correct cup conditional on having first reached the correct side. The principal comparison involved three matched trial classes: the symmetric 3-vs-3 condition, 2-vs-4 trials with the baited cup on the 2-cup side, and 2-vs-4 trials with the baited cup on the 4-cup side. Descriptively, pointing trials exceeded matched no-point control trials more clearly for side selection than for overall cup choice. The clearest condition effect was observed at the level of side guidance. Dogs were most likely to reach the correct side when the baited cup was located on the 4-cup side of the unequal arrangement. Mixed-effects models confirmed a reliable group effect for side accuracy, whereas overall cup accuracy showed only a weaker and less robust condition effect, and within-side localisation revealed no reliable group difference once condition-specific chance baselines were taken into account. A complementary generative model comparison converged on the same conclusion: a referential-only model fit poorly, an attention-only model captured most of the grouped outcome structure, and a combined model yielded only a modest improvement. Dog point-following is therefore best understood as a layered process dominated by attentional guidance, with only limited additional target-specific localisation.
Baek, J. S.; Lokhande, A.; Neuenschwander, D.; Shi, M.; Wang, M.
Show abstract
Purpose To investigate the relative efficacy of nine distinct visual field (VF) denoising artificial intelligence (AI) methods and a pathology-aware AI strategy to discourage over-correction of glaucomatous defects. Design Retrospective study. Participants 87,940 paired visual field (VF) and optical coherence tomography (OCT) samples from a tertiary academic center. Methods Denoising models were trained on a separate VF-only dataset and evaluated on an independent structure-function dataset of paired VF-OCT samples. We implemented and evaluated nine distinct VF denoising strategies representing three broad categories: baseline measurements, self-supervised and image restoration models (including Noise2Noise, Noise2Void, and NAFNet), and latent variable compression-based models (autoencoders and variational autoencoders). All models were designed to reconstruct VF sensitivity maps. We then predicted retinal nerve fiber layer thickness (RNFLT) maps from the denoised VFs using a fixed, independently trained VF-to-RNFLT prediction model. Main Outcome Measures Predicted VF and RNFLT maps and resultant evaluation metrics. Results The raw VF baseline achieved a global R2 of 0.5468 and MAE of 16.83 um. Restoration-based models maintained or slightly improved concordance, with the pathology-aware NAFNet achieving the highest global R2 of 0.5485 and a comparable MAE of 16.82 um. In contrast, compression-based models degraded concordance, with CNN-VAE showing a significant reduction (R2 approximately 0.50). In severe glaucoma, concordance decreased across all methods; however, compression architectures exhibited disproportionately greater degradation compared with restoration-based approaches. Conclusions We present a comparative benchmark of AI-based VF denoising strategies paired with structure-function evaluation. While restoration-based models can reduce variability without loss of biological signal, latent compression risks attenuating clinically meaningful defects. Visually smoother fields are not necessarily more biologically accurate.
Nakao, A.; Yamada, N.; Wakatsuki, T.
Show abstract
Internal forward models predict the sensory consequences of motor commands; however, whether the anticipated availability of post-action feedback contributes to the precision of the action itself remains unknown. We manipulated the predictability of post-release visual occlusion in skilled basketball players. Participants performed three-point shots while wearing liquid-crystal shutter goggles. The study tested three conditions: a no-occlusion baseline, certain-occlusion condition in which players knew that their vision would be occluded at ball release in every trial, and random-occlusion condition in which they could not predict whether an occlusion would occur. Shooting accuracy declined in the certain-occlusion condition relative to the no-occlusion condition (49.2% vs 41.7%). The random-occlusion condition did not differ from the baseline (46.1%). Within the random condition, the accuracy in occluded trials were virtually identical to that in non-occluded trials (46.6% vs 46.2%), even though the immediate visual occlusion was the same as in the certain-occlusion condition. These results demonstrate that it is not the absence of post-action information per se that disrupts motor execution, but the prior certainty that action consequences will be unavailable. We interpret this finding as a prospective influence of anticipated consequence loss, whereby motor execution depends on whether the prediction-outcome loop remains closable.
Maldonado, M.; Dinc, O. F.; Lacin, M. E.; Connor, T.; Bell, F.; dinc, b.; Ozdemirli, K.; Yildirim, M.
Show abstract
ObjectiveSimultaneous recording of brain activity, behaviour, and virtual environments is essential for understanding large-scale neural dynamics during behaviour. However, existing systems often rely on software-based synchronization or post hoc alignment, introducing latency, jitter, and drift that obscure fast brain-behavior interactions. ApproachHere, we present a deterministically synchronized widefield calcium imaging platform that unifies neural imaging, high-speed behavioural monitoring, and closed-loop virtual reality (VR) under a shared hardware-defined clock. This system enables millisecond-precision temporal alignment across modalities, including dual-wavelength hemodynamic correction, pupil and orofacial tracking, locomotion sensing, and VR rendering. Main resultsThe platform achieves stable hardware-level synchronization across neural imaging, behavioural recordings, and VR rendering without reliance on software timestamps. It supports widefield imaging rates up to 100 Hz and integrates seamlessly with both ViRMEn and Blender VR engines, exhibiting a mean locomotion-to-VR update latency of [~]1.5 ms. Multimodal recordings during VR navigation demonstrate robust temporal alignment between cortical activity, facial dynamics, pupil signals, and locomotion. SignificanceThis system provides a deterministic multimodal framework for studying brain-behaviour relationships during active behaviour. By enabling millisecond-precision synchronization across neural imaging, behaviour, and virtual environments, this platform enables causal investigation of brain-behaviour interactions at millisecond precision and provides a foundation for next-generation closed-loop neuroengineering experiments.
Qiu, N.; Allenmark, F.; Chen, S.; Müller, H. J.; Shi, Z.
Show abstract
Real-world distractors occur in environments whose states change at different rates. We asked whether such volatility alters early attentional gating or instead changes the criterion for committing to a response. Observers performed an additional-singleton search task with concurrent eye tracking while distractor presence followed high- or low-volatility sequences, with overall distractor prevalence held constant. Trial-pooled oculomotor capture was higher under high volatility, a pattern that appears to indicate altered filtering. That inference did not survive repetition-aware analysis: once the same-location run position was matched, capture did not detectably differ across volatility regimes. The pooled capture effect was therefore consistent with a structural consequence of the volatility manipulation, which enriched high-volatility blocks with early-run positions where capture is intrinsically high. The positive volatility signature appeared on distractor-absent trials, where high-volatility blocks were associated with longer target latency, more fixations, longer final-target dwell, and fewer errors. Same-location repetition learning showed no detectable difference in slope across regimes. A hierarchical drift-diffusion model (DDM) and a complementary volatility Kalman-filter (VKF) dynamic-state comparison indicated that manual responses were better described by architectures that allow both boundary-related and drift-related components than by a boundary-only account. Volatility, therefore, did not show detectable evidence of impairing the local gating rule; instead, the converging evidence points to a post-selective verification/caution profile, consistent with a precision-weighted read-out of environmental uncertainty.
Bleau, M.; Dessain, Q.; Dricot, L.; Nemargut, J. P.; Kupers, R.; Ptito, M.
Show abstract
Cognitive maps encode spatial relationships between locations and support flexible navigation. However, how these mental representations form in the absence of visual experience remains unclear. Here, we introduce a multisensory virtual navigation paradigm that allows to track the temporal dynamics of non-visual cognitive map formation. Sixteen early blind (EB), 17 late blind (LB), and 29 sighted controls (SC) learned the layout of a tactile maze. Participants repeatedly performed virtual pointing (estimating directions between locations) and navigation (reaching locations) tasks, which measured cognitive maps across multiple stages of learning. This method also enabled algorithmic inference of cognitive maps, providing insights into how mental distortions are progressively corrected. Although there were no group differences in average navigation performance, EB showed slower knowledge accumulation compared to LB and SC. In addition, both EB and LB had difficulties translating cognitive maps into first-person perspectives, resulting in reduced pointing and cognitive map accuracy. Yet, cognitive map accuracy improved progressively in all groups and a subset of EB and LB achieved expert-level performance with high navigation and pointing precision. In sum, this study provides a scalable framework for tracking alterations in cognitive map formation in blindness and other neurological conditions. Importantly, it demonstrates that cognitive map formation in the absence of vision is experience-dependent and trainable. Spatial disadvantages often observed in EB and LB thus do not reflect cognitive deficits but result from adaptive behavioral strategies constraining the use of allocentric cognitive maps.
Turski, J.
Show abstract
In previous studies by the author on binocular vision with the asymmetric eye (AE), which models a healthy human eye with misaligned optical components, the results were primarily presented in the Rodrigues vector (RV) framework and supported by simulations and 3D visualizations in GeoGebras dynamic geometry environment. In this paper, the novel geometric kinematics of the human eye, that is, the eye with misaligned optics, and simplified assumptions about the eye rotations (the eyes translational movements are disregarded), are developed within the framework of rigid-body rotations. The originality of the analysis lies in a precise geometric decomposition of a full rotation of the eyes posture into a torsion-free rotation (the geodesic part) and a torsional rotation (the non-geodesic extension of the geodesic part). This decomposition is extended to the corresponding decomposition of the angular velocity. A novel derivation of the eyes angular velocity from the RV formulation of the eye kinematics is proposed.
Geisler, W. S.
Show abstract
Perceptual systems in humans and many other animals are able to segment scenes into regions that are likely to be physically meaningful. This ability depends on having low-level mechanisms that can accurately categorize whether local image patches are samples from the same or different kinds of texture. We find that using spatial proximity as a proxy for same-different ground truth makes it possible to train accurate decision variables and bounds directly from arbitrary natural images with no feedback. We also find that performance can be further improved by using proximity as a ground truth for adjusting the final decision variables and bounds for the current image/scene. These surprising findings result from the simple fact that under a wide range of conditions proximity discrimination (near vs. far) and texture discrimination (same vs. different) have mathematically identical decision bounds if the same image features are used for both tasks. We used the decision variables and bounds trained on natural images as the initial steps in a hierarchical Bayesian observer (HBO) model of texture discrimination [9]. Given the relative simplicity of this HBO model, it did an excellent job of segmenting images having randomly shaped regions containing arbitrary natural textures. We suggest that the proximity proxy is something that natural selection could discover and exploit for any same-different task where the task-relevant stimulus features also vary systematically with distance in space and/or time. For example, natural selection could have created developmental learning/plasticity mechanisms that exploit the proximity proxy.
Yang, L.; Katada, Y.; Fujinami, K.; Yamamoto, S.; Fukuda, K.; Shinojima, A.; Tomita, Y.; Ban, N.; Shinoda, H.; Negishi, K.; Kurihara, T.
Show abstract
PurposeAssessing visual function in patients with ultralow vision (ULV), particularly those with retinitis pigmentosa (RP), remains a significant challenge in therapeutic development. Full-field stimulus test (FST) provides a quantitative measure of retinal light sensitivity and may serve as a valuable clinical endpoint. We investigated FST in ULV RP by examining its associations with functional measures and daily activity-based tasks. DesignObservational, cross-sectional study. ParticipantsPatients with RP and visual acuity in the worse-seeing eye below counting fingers (CF) were enrolled. MethodsAfter dilation and 45-minute dark adaptation, FST was performed monocularly with brief full-field white-light flashes across three visits. Visual acuity was classified into four groups: no light perception (NLP), light perception (LP), hand motion (HM), and CF or better. We assessed functional vision using two tabletop object-recognition and exploration tasks, two mobility tasks, and three vision-related questionnaires. FST thresholds were compared across visual acuity groups, and associations with functional outcomes were analyzed. Main Outcome MeasuresFST thresholds and their associations with functional vision outcomes. ResultsThirty-five patients (70 eyes; median age, 62 years, range 39-84) were included. Median FST thresholds (log cd*s/m{superscript 2}) by visual acuity group were as follows: NLP, 1.13 (-0.63-2.54); LP, -0.27 (-2.70-2.91); HM, -1.13 (-6.24-0.51); CF or better, -2.82 (-5.67- -1.73) (p < 0.001). Measurable FST thresholds were obtained in 9 of 14 NLP eyes (64.3%). FST thresholds showed significant correlations with tabletop performance (r = -0.70 to - 0.46) and mobility performance (r = -0.65), whereas no significant association was observed with questionnaire scores. Test-retest variability across three visits showed no systematic bias, with a coefficient of repeatability of {+/-}0.66 to {+/-}0.82 log cd{middle dot}s/m{superscript 2}. ROC analyses identified FST cutoffs of -1.75 to -0.87 log cd{middle dot}s/m{superscript 2} at which patients first achieved nonzero functional task performance. ConclusionsFST quantifies residual visual function in ULV RP and correlates strongly with performance-based measures of functional vision in daily life. These findings support FST as a clinically meaningful endpoint for therapeutic trials in advanced RP and other severe visual impairments and highlight the value of anchoring FST thresholds to functional task performance.
Huang, Z.; Dekker, T. M.; Crutch, S. J.; Yong, K. X. X.; Greenwood, J. A.
Show abstract
Incomplete letter recognition tasks are frequently used to detect visual deficits arising from neurodegenerative syndromes, including Posterior Cortical Atrophy (PCA; visual-variant Alzheimers disease). A recent development of this approach is the Graded Incomplete Letters Test (GILT), which measures recognition thresholds for letters degraded by removing pixelated sections (decreasing completeness). Although GILT thresholds are strongly elevated in PCA relative to typical adults, the precise cortical visual impairments underlying these deficits are unclear, as is the potential contribution from age-related optical limitations. We compared candidate cortical factors (crowding and global integration) with optical limitations (blur and low contrast) by simulating these factors in typical adults (n=6) viewing incomplete letter stimuli. Participants identified foveally presented letters (12 alternatives), with completeness varied using QUEST. At baseline, thresholds averaged [~]5% completeness. Optical factors were simulated by separately applying blur and lowered contrast. These factors had minimal effect on thresholds, except where blur/contrast levels approached visibility limits, where thresholds rose modestly but remained far below clinical levels in PCA. Cortical factors were simulated by increasing crowding (disruptions from clutter) through peripheral presentation, with global-integration impairments simulated by varying pixel size to alter the distribution of degradation (limiting spatial integration) or degrading letters dynamically with limited-lifetime pixels (limiting temporal integration). These manipulations substantially elevated thresholds, with combined crowding and global-integration impairments increasing thresholds to levels comparable with PCA. We conclude that impaired incomplete letter recognition is driven primarily by cortical rather than optical factors, and that neurodegenerative deficits may reflect the combined impact of multiple cortical limitations.
Super, R.; Bui, B. V.; Xie, J.; Bou-Antoun, P.; Scholz, L.; Jusuf, P. R.
Show abstract
Zebrafish (Danio rerio) are an important vertebrate model for vision and neuroscience research. In the larval stages, the aquatic species begins to elicit the optomotor response (OMR) to stabilize themselves in water -- a behaviour that may be exploited in the laboratory to measure visual acuity. However, up to now, the measurement of the OMR in juvenile and adult zebrafish has been limited due to their behavioural complexity. Here, we optimize a protocol to assay zebrafish aged between 4 and 9 weeks-post-fertilization, by displaying sinusoidal gratings parallel to the zebrafish eye to elicit a robust OMR. We assessed the visual spatial-frequency tuning function of an environmentally induced myopia model to confirm the sensitivity and robustness of the protocol. Additionally, we show the OMR is sensitive to the contrast and temporal resolution of the sinusoidal gratings. Furthermore, we found that the time between stimulus presentations impact the spatial-frequency tuning function likely as time is required for zebrafish to return to baseline swimming after eliciting the OMR. Finally, we found that the OMR after ten versus twenty seconds of stimulus onset appears comparable; indicating that robust OMR responses in zebrafish can be elicited through relatively short stimulus presentations. Through the experiments conducted, we present an optimized protocol specific to zebrafish. The protocol may be used to follow the progression or treatment efficacy of progressive neurological disorders including specific visual disorders and higher brain functions with visual endophenotypes. Ultimately, this protocol allows for high-throughput robust measures of visual and neural function in zebrafish.